Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling
نویسندگان
چکیده
Phrase based language models have been recognized to have an advantage over word based language models since they allow us to capture long span ning dependencies Class based language models have been used to improve model generalization and overcome problems with data sparseness In this pa per we present a novel approach for combining the phrase acquisition with class construction process to automatically acquire phrase grammar fragments from a given corpus The phrase grammar learning is decomposed into two sub problems namely the phrase acquisition and feature selection The phrase acquisition is based on entropy minimization and the feature selection is driven by the entropy reduction principle We further demonstrate that the phrase grammar based n gram language model signi cantly outperforms a phrase based n gram language model in an end to end evaluation of a spoken language application
منابع مشابه
Treebank-Based Probabilistic Phrase Structure Parsing
The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential sy...
متن کاملLanguage model acquisition from a text corpus for speech understanding
Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as...
متن کاملPhrase Structure in a Computational Model of Child Language Acquisition
The problem of the acquisition of morpho-syntactic rules, as addressed by a number of existing computational models, is introduced. A distinction is made between ‘innatist’ models which presuppose the importance of innate linguistic knowledge (specifically, syntactic categories and X-Bar Theory), and ‘empiricist’ models, which reject such assumptions. It is argued that ‘empiricist’ models bette...
متن کاملFinite-State Approximations of Grammars
Grammars for spoken language systems are subject to the conflicting requirements of language modeling for recognition and of language analysis for sentence interpretation. Current recognition algorithms can most directly use finite-state acceptor (FSA) language models. However, these models are inadequate for language interpretation, since they cannot express the relevant syntactic and semantic...
متن کاملComputation of the Probability of the Best Derivation of an Initial Substring from a Stochastic Context-Free Grammar
Recently, Stochastic Context-Free Grammars have been considered important for use in Language Modeling for Automatic Speech Recognition tasks [6, 10]. In [6], Jelinek and Lafferty presented and solved the problem of computation of the probability of initial substring generation by using Stochastic Context-Free Grammars. This paper seeks to apply a Viterbi scheme to achieve the computation of th...
متن کامل